429 research outputs found

    bGWAS: an R package to perform Bayesian genome wide association studies.

    Get PDF
    Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary data are available at Bioinformatics online

    SQC: secure quality control for meta-analysis of genome-wide association studies.

    Get PDF
    Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. [email protected]. Supplementary data are available at Bioinformatics online

    Genome-Wide Association between Transcription Factor Expression and Chromatin Accessibility Reveals Regulators of Chromatin Accessibility.

    Get PDF
    To better understand genome regulation, it is important to uncover the role of transcription factors in the process of chromatin structure establishment and maintenance. Here we present a data-driven approach to systematically characterise transcription factors that are relevant for this process. Our method uses a linear mixed modelling approach to combine datasets of transcription factor binding motif enrichments in open chromatin and gene expression across the same set of cell lines. Applying this approach to the ENCODE dataset, we confirm already known and imply numerous novel transcription factors that play a role in the establishment or maintenance of open chromatin. In particular, our approach rediscovers many factors that have been annotated as pioneer factors

    Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    Get PDF
    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries

    Exploiting the mediating role of the metabolome to unravel transcript-to-phenotype associations.

    Get PDF
    Despite the success of genome-wide association studies (GWASs) in identifying genetic variants associated with complex traits, understanding the mechanisms behind these statistical associations remains challenging. Several methods that integrate methylation, gene expression, and protein quantitative trait loci (QTLs) with GWAS data to determine their causal role in the path from genotype to phenotype have been proposed. Here, we developed and applied a multi-omics Mendelian randomization (MR) framework to study how metabolites mediate the effect of gene expression on complex traits. We identified 216 transcript-metabolite-trait causal triplets involving 26 medically relevant phenotypes. Among these associations, 58% were missed by classical transcriptome-wide MR, which only uses gene expression and GWAS data. This allowed the identification of biologically relevant pathways, such as between ANKH and calcium levels mediated by citrate levels and SLC6A12 and serum creatinine through modulation of the levels of the renal osmolyte betaine. We show that the signals missed by transcriptome-wide MR are found, thanks to the increase in power conferred by integrating multiple omics layer. Simulation analyses show that with larger molecular QTL studies and in case of mediated effects, our multi-omics MR framework outperforms classical MR approaches designed to detect causal relationships between single molecular traits and complex phenotypes

    A joint view on genetic variants for adiposity differentiates subtypes with distinct metabolic implications.

    Get PDF
    The problem of the genetics of related phenotypes is often addressed by analyzing adjusted-model traits, but such traits warrant cautious interpretation. Here, we adopt a joint view of adiposity traits in ~322,154 subjects (GIANT consortium). We classify 159 signals associated with body mass index (BMI), waist-to-hip ratio (WHR), or WHR adjusted for BMI (WHRadjBMI) at P < 5 × 10 <sup>-8</sup> , into four classes based on the direction of their effects on BMI and WHR. Our classes help differentiate adiposity genetics with respect to anthropometry, fat depots, and metabolic health. Class-specific Mendelian randomization reveals that variants associated with both WHR-decrease and BMI increase are linked to metabolically rather favorable adiposity through beneficial hip fat. Class-specific enrichment analyses implicate digestive systems as a pathway in adiposity genetics. Our results demonstrate that WHRadjBMI variants capture relevant effects of "unexpected fat distribution given the BMI" and that a joint view of the genetics underlying related phenotypes can inform on important biology

    Improving polygenic prediction with genetically inferred ancestry.

    Get PDF
    Genome-wide association studies (GWASs) have demonstrated that most common diseases have a strong genetic component from many genetic variants each with a small effect size. GWAS summary statistics have allowed the construction of polygenic scores (PGSs) estimating part of the individual risk for common diseases. Here, we propose to improve PGS-based risk estimation by incorporating genetic ancestry derived from genome-wide genotyping data. Our method involves three cohorts: a base (or discovery) for association studies, a target for phenotype/risk prediction, and a map for ancestry mapping; successively, (1) it generates for each individual in the base and target cohorts a set of principal components based on the map cohort-called mapped PCs, (2) it associates in the base cohort the phenotype with the mapped-PCs, and (3) it uses the mapped PCs in the target cohort to generate a phenotypic predictor called the ancestry score. We evaluated the ancestry score by comparing a predictive model using a PGS with one combining a PGS and an ancestry score. First, we performed simulations and found that the ancestry score has a greater impact on traits that correlate with ancestry-specific variants. Second, we showed, using UK Biobank data, that the ancestry score improves genetic prediction for our nine phenotypes to very different degrees. Third, we performed simulations and found that the more heterogeneous the base and target cohorts, the more beneficial the ancestry score is. Finally, we validated our approach under realistic conditions with UK Biobank as the base cohort and Swiss individuals from the CoLaus|PsyCoLaus study as the target cohort

    Evaluation and application of summary statistic imputation to discover new height-associated loci.

    Get PDF
    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression

    FADS3 is a Δ14Z sphingoid base desaturase that contributes to gender differences in the human plasma sphingolipidome.

    Get PDF
    Sphingolipids (SLs) are structurally diverse lipids that are defined by the presence of a long-chain base (LCB) backbone. Typically, LCBs contain a single Δ4E double bond (DB) (mostly d18:1), whereas the dienic LCB sphingadienine (d18:2) contains a second DB at the Δ14Z position. The enzyme introducing the Δ14Z DB is unknown. We analyzed the LCB plasma profile in a gender-, age-, and BMI-matched subgroup of the CoLaus cohort (n = 658). Sphingadienine levels showed a significant association with gender, being on average ∼30% higher in females. A genome-wide association study (GWAS) revealed variants in the fatty acid desaturase 3 (FADS3) gene to be significantly associated with the plasma d18:2/d18:1 ratio (p = -log 7.9). Metabolic labeling assays, FADS3 overexpression and knockdown approaches, and plasma LCB profiling in FADS3-deficient mice confirmed that FADS3 is a bona fide LCB desaturase and required for the introduction of the Δ14Z double bond. Moreover, we showed that FADS3 is required for the conversion of the atypical cytotoxic 1-deoxysphinganine (1-deoxySA, m18:0) to 1-deoxysphingosine (1-deoxySO, m18:1). HEK293 cells overexpressing FADS3 were more resistant to m18:0 toxicity than WT cells. In summary, using a combination of metabolic profiling and GWAS, we identified FADS3 to be essential for forming Δ14Z DB containing LCBs, such as d18:2 and m18:1. Our results unravel FADS3 as a Δ14Z LCB desaturase, thereby disclosing the last missing enzyme of the SL de novo synthesis pathway

    Risk prediction of developing venous thrombosis in combined oral contraceptive users.

    Get PDF
    Venous thromboembolism (VTE) is a complex multifactorial disease influenced by genetic and environmental risk factors. An example for the latter is the regular use of combined oral contraceptives (CC), which increases the risk to develop VTE by 3 to 7 fold, depending on estrogen dosage and the type of progestin present in the pill. One out of 1'000 women using CC develops thrombosis, often with life-long consequences; a risk assessment is therefore necessary prior to such treatment. Currently known clinical risk factors associated with VTE development in general are routinely checked by medical doctors, however they are far from being sufficient for risk prediction, even when combined with genetic tests for Factor V Leiden and Factor II G20210A variants. Thus, clinical and notably genetic risk factors specific to the development of thrombosis associated with the use of CC in particular should be identified. Step-wise (logistic) model selection was applied to a population of 1622 women using CC, half of whom (794) had developed a thromboembolic event while using contraceptives. 46 polymorphisms and clinical parameters were tested in the model selection and a specific combination of 4 clinical risk factors and 9 polymorphisms were identified. Among the 9 polymorphisms, there are two novel genetic polymorphisms (rs1799853 and rs4379368) that had not been previously associated with the development of thromboembolic event. This new prediction model outperforms (AUC 0.71, 95% CI 0.69-0.74) previously published models for general thromboembolic events in a cross-validation setting. Further validation in independent populations should be envisaged. We identified two new genetic variants associated to VTE development, as well as a robust prediction model to assess the risk of thrombosis for women using combined oral contraceptives. This model outperforms current medical practice as well as previously published models and is the first model specific to CC use
    corecore